Effect of assimilating CO2 observations in the Korean Peninsula on the inverse modeling to estimate surface CO2 flux over Asia

To investigate the impact of two CO2 observation datasets obtained from the Korean Peninsula on the surface CO2 flux estimation over Asia, the two datasets are assimilated into the CarbonTracker (CT) inverse modeling system and the estimated surface CO2 fluxes are analyzed. Anmyeon-do (AMY) and Gosan (GSN) sites in the Korean Peninsula have observed surface CO2 mole fraction since the late 1990s. To investigate the effect of assimilating the additional Korean observations on the surface CO2 flux estimation over Asia, two experiments are conducted. The reference experiment (CNTL) only assimilates observations provided by National Oceanic and Atmospheric Administration (NOAA), while the other experiment (EXP1) assimilates both NOAA observations and two Korean observation datasets. The results are analyzed for 9 years from 2003 to 2011 in Asia region because both AMY and GSN datasets exist almost completely for this period. The annual average of estimated biosphere CO2 flux of EXP1 shows more flux absorption in summer and less flux emission from fall to spring compared to CNTL, mainly on Eurasia Temperate and Eurasia Boreal regions. When comparing model results to independent CO2 concentration data from surface stations and aircraft, the root mean square error is smaller for EXP1 than CNTL. The EXP1 yields more reduction on uncertainty of estimated biosphere CO2 flux over Asia, and the observation impact of AMY, GSN sites on flux estimation is approximately 11%, which is greater than other observation sites around the world. Therefore, the two CO2 observation sets in the Korean Peninsula are useful in reducing uncertainties for regional as well as global scale CO2 flux estimation.


Introduction
The annual mean surface air temperature of the earth has increased since the second industrial revolution in the late 19 th century. Global mean surface temperature (GMST) of the last decade (2011-2020) was approximately 1.09˚C higher than that of the preindustrial period (1850-1900) [1]. The Paris Agreement adopted in 2015 aims at limiting the GMST rise below 2˚C, or 1.5˚C if possible, compared to the pre-industrial levels [2]. However, Intergovernmental Panel on Climate Change (IPCC) special report has stated that the temperature increase could surpass 1.5˚C at the middle of this century without additional emission decrease, even if the current nationally determined contributions (NDCs) are achieved [3]. Therefore, it is important to estimate the precise sources and sinks of greenhouse gases in order to support the emission policies based on the scientific information as well as to manage the risks from the climate change. CO 2 is the most abundant component among the greenhouse gases in the troposphere. Since [4] attempted to estimate surface CO 2 flux by assimilating observed CO 2 mole fraction data in a model, many researchers have studied surface CO 2 flux optimization using data assimilation (DA) approach. Surface CO 2 flux estimation is an inverse modeling problem, known as an under-deterministic problem, which finds the solutions from relatively small number of observations. Thus, it needs prior surface CO 2 flux information and more importantly observation data accumulated in long time period, so as to estimate the surface CO 2 fluxes more precisely. Observations from the surface (i.e., flask observation and in situ observation) are commonly used in the inverse modeling. The spatial density of the surface CO 2 observations is relatively high in North America and Europe, giving more reliable surface CO 2 flux information of the two continents [5]. On the other hand, Asia, Africa, and South America have relatively low observation densities to well enough constrain the surface CO 2 fluxes. To cover the deficiency of observation density in Asia, inverse modeling researches have been conducted by assimilating tower observations in Siberian region and aircraft observations over the globe into inverse models [6][7][8][9]. Satellite column CO 2 (XCO 2 ) data have also been used to supplement the spatial coverage of surface CO 2 observations. The Greenhouse Gases Observing Satellite (GOSAT; [10, 11]) XCO 2 data have been used in various inverse modeling studies [12][13][14][15][16][17][18][19][20][21]. The Orbiting Carbon Observatory-2 satellite (OCO-2; [22,23]) XCO 2 data also have been used in several studies [19,15,[24][25][26]. Despite broad spatial coverage of satellite XCO 2 data, the surface CO 2 flux estimation using satellite XCO 2 data shows large uncertainty depending on observation coverage, number of data, and retrieval algorithm [15,17,19]. Those studies have shown some improvements in optimizing surface CO 2 flux in Asia, but more observation data are required to more precisely estimate the surface CO 2 fluxes in terms of their spatial and temporal patterns as well as total annual budgets in different ecoregions over Asia.
Meanwhile, each country involved in the Paris agreement is obliged to set up its own contribution to greenhouse gas emission reduction and report its annual emission inventory to United Nations Framework Convention on Climate Change (UNFCCC) to check how much reduction is achieved. World Meteorological Organization (WMO) has established Integrated Global Greenhouse Gas Information System (IG3IS) platform, which combines traditional inventory reporting and inverse modeling results to support countries to make the inventory report with less uncertainties. [27] applied CO 2 mole fraction data observed from two surface in situ observatories and shipboard across New Zealand to an inverse model, and revealed that the modeled CO 2 flux absorption from indigenous forests in New Zealand is stronger than the absorption calculated from the inventory report. In the UK, CO 2 observation from the tower network were assimilated into an inverse model and the estimated CO 2 flux from the biosphere seemed to be zero balanced, different from the inventory report [28]. The regional scale observation networks have been established in Switzerland and Paris, France, in order to assist regional greenhouse gas emission estimation. As it becomes more important to secure enough observations for surface CO 2 flux optimization, more Asian observations are necessary to be utilized in inverse modeling.
In this study, CO 2 observation data from Anmyeon-do (AMY) and Gosan (GSN), located in the Korean Peninsula, are assimilated into the CarbonTracker (CT), and the effect of the two observation datasets on Asian surface CO 2 flux estimation is investigated. AMY and GSN sites have accumulated observation data over approximately more than 9 years, which could provide flux information over the Korean Peninsula and its surrounding Asian regions. Model and observations used are introduced in section 2. In section 3, the estimated CO 2 fluxes in Asia from the 9-year (2003-2011) assimilation experiments using two additional observation datasets and the effects of assimilating AMY and GSN data into the inverse modeling are discussed. Finally, section 4 presents the summary and conclusions.

Inverse modeling system
CT is a global scale inverse model [29] developed to estimate the surface CO 2 fluxes using CO 2 observations as a constraint. To assimilate CO 2 mole fraction observations, CT adopts an ensemble Kalman filter (EnKF) DA method. Because the number of observation sites is too sparse to cover the whole globe, prior flux information needs to be given in advance. First guess of CO 2 flux is presented as a combination of four different prior flux information as below: Fðx; y; tÞ ¼ l r � F bio ðx; y; tÞ þ l r � F ocn ðx; y; tÞ þ F fire ðx; y; tÞ þ F ff ðx; y; tÞ; ð1Þ where F bio , F ocn , F fire and F ff represent CO 2 flux from terrestrial ecosystem activity (CASA GFED v3.1: [30,31]), atmosphere-ocean CO 2 exchange [32], biomass burning from the forest fire [30,31], and fossil fuel combustion (Carbon Dioxide Information and Analysis Center [CDIAC] database: [33]), respectively. Since CO 2 emissions from fossil fuel use and forest fire are prescribed, it is important to find precise biosphere flux and ocean flux estimates to optimize the total CO 2 fluxes. In CT, scaling factor λ r is updated through DA and used for optimizing the two CO 2 flux components (i.e., biosphere and ocean fluxes). Each scaling factor corresponds to the specific ecosystem called ecoregion. The 126 ecoregions in the land among 209 ecoregions that combines Transcom regions (i.e., 11 regions) and ecosystem types (i.e., 19 types), and the 30 ecoregions in the ocean [34] are paired with scaling factors and those scaling factors update the flux values. These total 156 ecoregions are shown in CarbonTracker website (https://gml.noaa.gov/ccgg/carbontracker/CT2013B_doc.pdf). Among 209 ecoregions mentioned above, the unrealistic combinations of Transcom regions and ecosystem types (e.g., mangrove in the Eurasia Boreal) are not included in the 126 ecoregions in the land. Transport Model 5 (TM5: [35]), an offline atmospheric chemical transport model, converts surface CO 2 flux from Eq (1) into model CO 2 concentration (i.e., mole fraction). TM5 then calculates advection, convection, and vertical diffusion of CO 2 using ERA-Interim reanalysis data from the European Centre for Medium-Range Weather Forecasts (ECMWF). TM5 works as an observation operator calculating model CO 2 concentration corresponding to the observed CO 2 concentration at the time and space of which observation occurs.
The EnKF scheme used in CT is the ensemble square root Kalman filter (EnSRF) adopted from [36]. EnSRF separately updates ensemble mean and ensemble perturbation, as follows: where � x is the ensemble mean of the state vector x, which is scaling factor being updated in the DA system in this study. Subscripts a and b are analysis and background, respectively, and y O is the observation vector. H is a linear observation operator projecting the model state vector onto the observation space, and the TM5 works as H in CT. K andk are Kalman gain and reduced Kalman gain, respectively.
where a ¼ 1 þ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi , P b is the model's background error covariance, and R is the observation error covariance. P b H T and HP b H T is calculated using the equations below: where N is number of ensemble members. It is necessary to prevent the sampling error amplification by limited ensemble members. Covariance localization technique [37] is conducted in CT to exclude the effect of the remote observations to the surface CO 2 flux estimation, as the remote observations from the flux location are barely correlated with the flux concerned. Since no physical relationship exists between scaling factors, the correlations are calculated between scaling factor deviations of ensemble members and corresponding modeled flux deviations. If the correlation values fail the significance test, the scaling factor is not updated. Marine Boundary Layer (MBL) observations are exempt for the localization as MBL sites captures the flux signal from a distance [34].
On each week of the simulation period, the EnKF assimilates observations from the most recent week to update the scaling factors of the five weeks including the past four weeks as well as the most recent week. As [38] denoted, the time lag scheme helps to consider that the observations can contain the signal of sources or sinks away from the observing sites. Previous studies using a five-week time lag [7,29,[39][40][41] showed that the five-week time lag is appropriate to optimize the surface CO 2 flux in North America, Europe, and Asia.
In CT, the scaling factor for the upcoming analysis week is predicted by a simple model as follows: where l b t is a prior scaling factor for the upcoming analysis week t; l a tÀ 2 and l a tÀ 1 are posterior scaling factors of week t-2 and t-1, respectively. λ p is a fixed value 1, so that the scaling factor returns to 1 when there are no assimilated observations.

CO 2 observation
In this study, two surface CO 2 mole fraction observation datasets from AMY (https://gaw. kishou.go.jp/search/file/0039-2014-1001-01-01-9999) and GSN (https://gaw.kishou.go.jp/ search/file/0052-2025-1001-01-01-9999) sites in the Korean Peninsula are additional observations that are assimilated in CT. AMY station has been operated by Korea Meteorological Administration (KMA) since 1998 and a year later has been designated as a regional global atmospheric watch (GAW) station. GSN station has started observation in 2002 by National Institute of Environmental Research (NIER), and KMA has taken over the operation since 2012 with new acronym JGS (Jeju Gosan). AMY and GSN data are obtained relatively remote from anthropogenic sources such as factories or residential area, which are appropriate to represent the CO 2 concentration in Northeast Asia region. The method used for observing the two datasets is non-dispersive infrared analyzer (NDIR), which is able to log quasi-continuous CO 2 concentration.
In this study, AMY and GSN data are assimilated together with other observation datasets from Observation Package (ObsPack) product in CT. The ObsPack product, provided by National Oceanic and Atmospheric Administration (NOAA) Earth System Research Laboratory (ESRL) [42], is a collection of CO 2 observations around the world. Diverse research institutes including NOAA, the Commonwealth Scientific and Industrial Research Organization (CSIRO), the National Center for Atmospheric Research (NCAR), and Environment and Climate Change Canada (ECCC) have provided observed data for ObsPack production. Most ObsPack data are obtained by averaging observed values between 12-16 local standard time (LST) since the TM5 model shows good performance in simulating well-mixed atmospheric layer of daytime. For observation sites located at the mountaintops, observations between 00-04 LST are averaged because there is less chance of local biogenic or anthropogenic CO 2 inflow from the downslope during the nighttime [34]. Daily mean AMY and GSN data are also obtained by averaging 12-16 LST data, following the ObsPack data.
Model-data mismatch (MDM) (i.e., observation error) for ObsPack is prescribed based on observation type and geographic characteristics in CT. When assimilated, the observation error for two datasets needs to be prescribed. MDM of both AMY and GSN is set to 3 ppm based on several verifications conducted in [43]. Beside AMY and GSN, Tae-ahn Peninsula (TAP) data are already included in the ObsPack. Note that TAP's MDM is 5 ppm, following [41].
For verification of the results, independent CO 2 observations in Asia that are not assimilated in CT are used. Those independent observations are aircraft observation data from the National Institute for Environmental Studies (NIES) Japan, called The Comprehensive Observation Network for Trace gases by Airliners (CONTRAIL) (http://doi.org/10.17595/20180208. 001; [44,45]), and surface observations from World Data Centre for Greenhouse Gases (WDCGG, https://gaw.kishou.go.jp/). Tables 1 and 2

Experimental framework
CT2013B version is used in this study, which is able to simulate the surface CO 2 flux from 2000 to 2012. Two experiments are conducted to investigate the impact of AMY and GSN observations on surface CO 2 flux estimation. EXP1 experiment assimilates all available observations (AMY, GSN observations, and ObsPack datasets), while CNTL assimilates only ObsPack data. The TM5 model runs on a two-way nested grid with a 3˚⨯2˚outer domain on the globe and a 1˚⨯1˚nesting domain centered on Asia (Fig 1). The experimental period is from 2002 to 2011 because both AMY and GSN datasets exist almost completely for this period. The experimental results are analyzed for 9 years from 2003 to 2011 except for the first year (i.e., 2002) as a spin-up. More details about the experimental settings are summarized in Table 3.  emissions in inland India and a small part of southern China (Fig 2B and 2C). The average flux distributions of CNTL and EXP1 show generally similar patterns (Fig 2B and 2C), but EXP1 shows more absorption than CNTL does in inland southern China, the Korean Peninsula, and Japan ( Fig 2D). Locations such as the border region between northern Thailand and China represent greater emissions in EXP1 than in CNTL (Fig 2D). Mixed forest areas coincide well with the areas where the DA effects are obvious (Fig 2B, 2C and 2E), which is due to background error covariance in the EnKF in CT. As each scaling factor is assigned to respective ecoregions, the background error covariance matrix shows correlations between ecoregions in different Transcom regions. Since the dynamical model in Eq (8) does not include an

PLOS ONE
Effect of assimilating CO 2 observations in the Korean Peninsula to estimate surface CO 2 flux over Asia error term, the background error covariance is set to a prior covariance structure and not predicted with the dynamical model [29]. According to [34], the same ecoregions among five different Transcom regions (North American Boreal, North American Temperate, Eurasia Boreal (EB), Eurasia Temperate (ET), and Europe) have correlations although those correlations become small for distant ecoregions. These correlations allow observations in a certain ecoregion to update scaling factors connected to the same ecoregion concerned, through the DA. It explains how AMY data affects more on the specific areas. Fig 3 shows annual and average surface CO 2 fluxes on the globe, the land, and the ocean. Compared to prior flux, global CO 2 flux uptake by land vegetation and ocean in CNTL and EXP1 is approximately 2 Pg C yr -1 greater (Fig 3A). Most of this CO 2 uptake difference between the experiments and prior flux comes from the CO 2 absorption by the terrestrial vegetation (Fig 3B), while CO 2 flux absorptions from the ocean in the CNTL and EXP1 are only slightly different from the prior flux (Fig 3C). When comparing CNTL and EXP1 results, EXP1 shows slightly more (less) biogenic (oceanic) CO 2 flux absorption than CNTL does. The interannual variation of CNTL and EXP1 during the analysis period is very similar, indicating that the two Korean observation datasets assimilated in CT did not interrupt the consistency in the global surface CO 2 flux variability. The prior fluxes show the greatest uncertainties on the globe, the land, and the ocean, followed by CNTL and EXP1 (Fig 3A-3C). The average uncertainty of the prior flux for 9 years on the globe decreases by 22.5% in CNTL and 24.2% in EXP1 (Fig 3A). The decreases of uncertainties in CNTL and EXP1 compared to the prior flux uncertainty are greater on the land than the ocean (Fig 3B and 3C).  EXP1 estimate -1.08 Pg C yr -1 and -1.27 Pg C yr -1 of CO 2 flux for EB (Fig 4A), while -0.43 Pg C yr -1 and -0.61 Pg C yr -1 of CO 2 flux for ET (Fig 4B). In particular, compared to CNTL, EXP1 shows larger biogenic surface CO 2 flux absorption in ET since the added two Korean CO 2 observation sites are located in ET. The biogenic CO 2 flux absorption in EB is also affected by the two observation datasets in ET owing to the background error covariance structure described in section 3.1.1. Assimilation of AMY and GSN datasets results in negative CO 2 fluxes greater in Asia, which implies the possibility of the enhanced CO 2 absorption as well as the weakened CO 2 respiration. The prior fluxes show the greatest uncertainties in EB, ET, and Tropical Asia (TA), followed by CNTL and EXP1 (Fig 4A-4C). The average uncertainty of the prior flux for 9 years in EB decreases by 19.8% in CNTL and 21.8% in EXP1 (Fig 4A), and that in ET decreases by 18.7% in CNTL and 23.9% in EXP1 (Fig 4B). In TA, CNTL and EXP1 show very similar uncertainties (Fig 4C).

Annual and average surface CO 2 flux.
In EXP1, the EB region has the lowest flux absorption in 2005 and 2006 and the greatest flux absorption in 2008 and 2011 (Fig 4A). The ET has the lowest CO 2 uptake in 2003 and 2005 and the greatest in 2011 (Fig 4B). This interannual variability of biogenic CO 2 flux seems to be affected by the climate events. It is known that El Nino enhances the CO 2 sources while La Nina intensifies the CO 2 sinks [47,48]. Based on the ENSO ONI index from NCEP (https:// origin.cpc.ncep.noaa.gov/products/analysis_monitoring/ensostuff/ONI_v5.php), the strong La Nina events occurred during 2007-2008 and 2010-2011, and the biogenic CO 2 flux absorption estimated from the CT increased at the same period. Meanwhile, weak El Nino events occurred during 2004-2005 and late 2009, and the biogenic CO 2 flux absorption from CT weakened during that period. Additionally, extreme drought conditions occurred in 2003 in all of the northern midlatitudes [49] result in reduced uptake of CO 2 [6]. Therefore, assimilating CO 2 observation datasets in Korea reflects the climate effect on the surface CO 2 exchange in EB and ET (Fig 4A and 4B). TA has very small CO 2 uptake and emission of less than 0.3 Pg C yr -1 , irrespective of the experiments (Fig 4C). Surface CO 2 flux estimates over the TA region have been known to have high uncertainty because there are little observations for the inverse modeling to represent the signal of source and sink [39,50,51]. [52,53] showed that the nearneutral CO 2 flux in tropical region is due to the balance between the CO 2 release from deforestation and the CO 2 uptake by the intact tropical forests. [48] showed that the carbon budget in South Asia and Southeast Asia is close to neutral, with weak signs of carbon sink. Fig 5 shows the time series of monthly surface CO 2 fluxes averaged in the analysis period (i.e., 2003-2011) for the individual Transcom regions in Asia. In both CNTL and EXP1, a distinct seasonal variation pattern is found in the EB and ET regions (Fig 5A and 5B), in which flux absorption occurs in summer and flux emission occurs from autumn to spring.

Monthly and weekly aggregated surface CO 2 flux.
Over the EB region, where vegetation activity is very active, approximately -12 Pg C yr -1 is estimated to be absorbed to the surface every summer with a large surface CO 2 flux uncertainty ( Fig 5A). Compared to CNTL, CO 2 flux emission in spring in EXP1 decreases in the EB region. The ET region shows large difference between CNTL and EXP1 (Fig 5B). Compared to CNTL, EXP1 shows stronger flux absorption in the summer and weakened flux emission in winter and spring. In particular, CO 2 flux emission in spring in EXP1 is reduced to less than half of CNTL. The uncertainties of surface CO 2 flux estimation in CNTL and EXP1 are the greatest in summer. In the TA region, there is little difference between CNTL and EXP1 and there is no distinct seasonal variation although there are CO 2 flux absorption in spring and fall and release in summer (Fig 5C).
Therefore, the assimilation of AMY and GSN observations in CT enhances the monthly surface CO 2 flux absorption in summer in Asia region (especially EB and ET), and decreases the CO 2 emission in spring and winter season. The seasonal surface CO 2 fluxes would be changed if the seasonal and diurnal variations of MDM are considered. The effect of MDM variations in estimating the surface CO 2 fluxes over Asia would be a future study. Figs 6 and 7 show the weekly cumulative fluxes of each year and their differences from 9-year average values calculated for EB and ET, respectively. The EB region shows seasonal variation every year, absorbing CO 2 strongly in summer and emitting CO 2 in spring and winter (Fig 6A  and 6C). CNTL shows a decrease in summer CO 2 flux absorption in 2003,2004,2006, and 2007, whereas shows strong above average CO 2 flux absorption in summer of 2008, 2009, and 2011 ( Fig 6B). In EXP1, the flux absorption is much more active during the whole analysis period since averaged weekly cumulative flux is shifted to the negative direction (Fig 6C). From 2007 to 2011, the flux absorption in EXP1 is similar to or greater than the 9-year averaged flux (Fig 6D).
The ET region shows seasonal variation similar to the EB region, but its magnitude decreases by half (Fig 7A and 7C). In CNTL, the flux absorption is weaker than the 9-year averaged flux absorption in 2003, 2005, and 2010, whereas the flux absorption is stronger and  Fig 7B). The average cumulative CO 2 flux for EXP1 is lower than the CNTL, indicating EXP1 uptakes more CO 2 flux than CNTL does ( Fig  7D). In EXP1, strong spring CO 2 uptake occurred in 2007 and from 2009 to 2011, which made the average cumulative flux in spring period close to zero. In 2010, CO 2 uptake in EXP1 decreased from summer to fall, which is similar to the CNTL result, but EXP1 showed more CO 2 flux absorption in spring and early winter.
Overall, more weekly cumulative CO 2 absorption is simulated for the terrestrial biosphere in Asia, and the flux differences are more diverse when assimilating the two Korean observation datasets in CT. Table 4 summarizes bias and RMSE of model CO 2 concentrations with respect to observed CO 2 concentrations, and correlation coefficient between model CO 2 concentrations and observed CO 2 concentrations, for seven independent surface CO 2 observation sites in Asia. CNTL shows positive bias for every site except RYO, implying that the CNTL generally overestimates observed CO 2 concentration at evaluation sites. EXP1 also shows positive bias for all sites, but their absolute values are smaller than those of CNTL except RYO, which indicates that EXP1 estimates more accurate CO 2 concentrations than CNTL. The monthly model CO 2 concentrations in CNTL and EXP1 are mostly overestimated compared to the monthly observed CO 2 concentrations and the biases are relatively smaller in winter than in summer, indicating better performance of CT in winter (not shown). EXP1 shows smaller biases than CNTL during November to April period except January.

Verification with surface observations.
The RMSE of CNTL is smaller than that of EXP1 for COI, HAT, RYO, MNM, and YON sites, which are located on islands or seaside. In contrast, the RMSE of EXP1 is smaller than that of CNTL for LLN and SDZ sites which are located inland. Averaging over all sites, the   The evaluation with the independent surface CO 2 observations indicates that, by assimilating AMY and GSN site observations into CT, the bias of model CO 2 concentration could be reduced and the model CO 2 concentration with DA could be more accurate than that without DA, particularly over inland vegetation region. In terms of monthly verification, the bias of monthly model CO 2 concentration could be reduced for the winter to early spring seasons, by assimilating AMY and GSN site observations into CT.

Verification with CONTRAIL aircraft observations.
In this section, the model CO 2 concentrations for each experiment are verified with respect to the independent CON-TRAIL observations, which are not assimilated in any of the experiments. CONTRAIL observations are categorized with two types of observation mode: ascending/descending mode and level mode (Fig 8). The vertical observation (ascending/descending mode) is conducted while ascending from/descending to the airport.

PLOS ONE
Effect of assimilating CO 2 observations in the Korean Peninsula to estimate surface CO 2 flux over Asia excluded from the analysis. The method of separating CONTRAIL data into several bins follows [7]. Table 5 shows RMSE of model CO 2 concentrations for each experiment with respect to the CONTRAIL observations. The average RMSEs for total CONTRAIL data are smaller in EXP1 (1.506 ppm) than in CNTL (1.545 ppm), and those for the ascending/descending mode are also smaller in EXP1 (1.581 ppm) than in CNTL (1.643 ppm). At bins 3, 4, 7 and 8, EXP1 shows smaller RMSEs than CNTL does at each level. At bin 5, EXP1 shows smaller RMSE than CNTL does only at level 4. At bin 6, CNTL shows smaller RMSEs than EXP1 does at level 1, 2 and 4. In case of the level mode, the RMSE of EXP1 (1.414 ppm) is also smaller than that of CNTL (1.424 ppm).
When considering the whole levels for the ascending/descending mode observation, the bins 3, 4, 5, 7 and 8 have the smaller RMSEs in EXP1, while the bin 6 has the smaller RMSE in CNTL (Fig 8).

PLOS ONE
Effect of assimilating CO 2 observations in the Korean Peninsula to estimate surface CO 2 flux over Asia Overall, the evaluation with the independent CONTRAIL observations shows that assimilating additional Korean observations into the CT led to more accurate surface CO 2 flux estimations over Asia.

Uncertainty reduction
Fig 9 shows the uncertainty reduction rate of each experiment for the estimated posterior surface CO 2 flux compared to the prior surface CO 2 flux, averaged over the analysis period. The area of maximum uncertainty reduction in CNTL appears on parts of inner China, Mongolia, and central Asia regions, showing approximately 40% of reduction. It is followed by Siberian region near 60˚N latitude, showing approximately 28% of uncertainty reduction after the optimization (Fig 9A). For EXP1, the uncertainty reduction is approximately 59% on Eastern China, Korea, and Japan, where the location coincides with the mixed forest ecoregion of ET (Fig 9B). Uncertainty reduction rate around 60˚N Siberian region is still relatively high in EXP1 compared to CNTL, nearly 35% of uncertainty decreases. Some parts of India show 27% uncertainty reduction in EXP1.
The difference of the uncertainty reduction between EXP1 and CNTL is shown in Fig 9C. Compared to CNTL, EXP1 shows more uncertainty reduction over Eastern China, Korea, Japan, and India, located in ET region. In Siberia, there is no distinct difference in uncertainty reduction between CNTL and EXP1. EB and TA regions also show little differences between EXP1 and CNTL. Therefore, the uncertainties in the estimated surface CO 2 flux over ET are reduced by adding the two Korean observation datasets in CT.

Influence matrix
An influence matrix is used to measure the influence of assimilated observation data on the model result. [6,41,54] calculated the influence of surface CO 2 observation data assimilated on the estimated surface CO 2 flux in CT. The diagonal component of the influence matrix, known as self-sensitivity or observation impact, represents the impact of the observation on model value on each observation site. Fig 10 shows the self-sensitivities of the surface CO 2 observations assimilated in EXP1, averaged over the analysis period 2003-2011. AMY, GSN, and some overlapped sites are marked as circles with different size and color. The spatial density of observation sites is high in the North America and the Europe. The number of observation sites is relatively smaller in Asia, Australia, other continents, and oceans. Large self-sensitivities are found around observationsparse regions. The self-sensitivities are fairly evenly distributed where observation sites are dense, although some sites in Alaska, Western America, and Northern Europe show large selfsensitivities. The global average self-sensitivity in EXP1 is 6.14%. The self-sensitivities of AMY and GSN are 11.71% and 11.38%, respectively, which are larger than the global average. Given that the self-sensitivity of TAP is 2.7%, the relatively large self-sensitivities in AMY and GSN imply that AMY and GSN observations play a more important role in producing the optimized surface CO 2 flux. AMY and GSN observations also help estimate surface CO 2 flux in Asia with low observation density.

Summary and conclusion
In this study, two CO 2 observation datasets from AMY and GSN sites in the Korean peninsula are introduced in CT DA system and the effect of the observations on surface CO 2 flux estimation in Asia is investigated for the 9-year period from 2003 to 2011. The annual average surface CO 2 flux uptake on the East Asia is enhanced in EXP1 experiment in which AMY and GSN observations are assimilated. By assimilating observations from the AMY and GSN, ET regions including the Korean peninsula, Japan, and inland China show stronger CO 2 absorption in summer, while weakened CO 2 emission in spring and autumn. EB regions also show the similar pattern.
Independent surface and aircraft CO 2 observations are used for the verification of the experimental results. Assimilating two additional observation datasets into CT reduced the root mean square error of modeled CO 2 concentration with respect to independent CO 2 observation concentration, and enhanced uncertainty reduction when optimizing surface CO 2 flux in Asia region. The regions with small RMSEs are consistent with the regions with significant uncertainty reduction, which include the Korean Peninsula, southern inland China, eastern China, and Japan.
Self-sensitivities at AMY and GSN are relatively high, which indicates that the two observation sites in Korea (AMY, GSN) are considerably important in estimating surface CO 2 flux in Asia. The use of CO 2 observations in the Korean Peninsula is expected to greatly contribute not only to the estimation of surface CO 2 flux in Asia at various scales, but also to the elaboration of the national emission inventory.